We identify the task of measuring data to quantitatively characterize the composition of machine learning data and datasets. Similar to an object's height, width, and volume, data measurements quantify different attributes of data along common dimensions that support comparison. Several lines of research have proposed what we refer to as measurements, with differing terminology; we bring some of this work together, particularly in fields of computer vision and language, and build from it to motivate measuring data as a critical component of responsible AI development. Measuring data aids in systematically building and analyzing machine learning (ML) data towards specific goals and gaining better control of what modern ML systems will learn. We conclude with a discussion of the many avenues of future work, the limitations of data measurements, and how to leverage these measurement approaches in research and practice.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
机器学习(ML)研究通常集中在模型上,而最突出的数据集已用于日常的ML任务,而不考虑这些数据集对基本问题的广度,困难和忠诚。忽略数据集的基本重要性已引起了重大问题,该问题涉及现实世界中的数据级联以及数据集驱动标准的模型质量饱和,并阻碍了研究的增长。为了解决此问题,我们提出Dataperf,这是用于评估ML数据集和数据集工作算法的基准软件包。我们打算启用“数据棘轮”,其中培训集将有助于评估相同问题的测试集,反之亦然。这种反馈驱动的策略将产生一个良性的循环,该循环将加速以数据为中心的AI。MLCommons协会将维护Dataperf。
translated by 谷歌翻译
在动态对抗数据收集(DADC)中,人类的注释者是任务的,找到模型努力预测的示例。已经显示出在达克收集的训练数据上培训的模型在对抗和域外设置方面更加强大,并且对于人类来说更难愚弄。然而,DADC比传统数据收集更耗时,因此每个示例更昂贵。在这项工作中,我们检查我们是否可以保持DADC的优势,而不会遭受额外的成本。为此,我们引入了生成的注释助理(GaAs),生成的循环模型,提供了注释器完全批准,修改或拒绝的实时建议。我们在20个实验设置中收集培训数据集,并对这种方法进行详细分析,用于标准和对冲数据收集的提取问题应答(QA)的任务。我们展示了GaAs在注释速度方面提供了显着的效率效益,同时导致改善模型愚蠢的速率。此外,我们还表明,GaA辅助数据在回答任务的各种问题上导致更高的下游模型性能。
translated by 谷歌翻译
最先进的愿景和愿景和语言模型依靠大规模的Visio-linguisting预借鉴,以获得各种下游任务的良好性能。通常,这种模型通常是跨模态(对比)或多模态(具有早期融合)但不是两者;它们通常只针对特定的方式或任务。有希望的方向将是使用单一整体普遍模型,作为“基础”,目标是一次性的所有方式 - 真正的视觉和语言基础模型应该擅长视力任务,语言任务和交叉和多数模态视觉和语言任务。我们将Flava介绍在这样的模型中,并在跨越这些目标模式的广泛的35个任务上展示令人印象深刻的性能。
translated by 谷歌翻译
为了创建在广泛的测试输入中强大的模型,培训数据集应包括跨越许多现象的各种示例。动态的对抗数据收集(DADC),注释者制作的示例挑战不断改进模型的示例,其有望是生成这种多样化训练集的一种方法。先前的工作表明,在1-3轮中运行DADC可以帮助模型修复某些错误类型,但不一定会带来更好的概括,而不是对抗性测试数据。我们认为,在许多回合中运行DADC可以最大化其训练时间的好处,因为不同的回合可以涵盖许多与任务相关的现象。我们介绍了长期DADC的第一个研究,其中我们收集了20轮NLI示例,用于一小部分前提,并采用对抗性和非对抗性方法。与接受非对抗数据的训练的模型相比,接受DADC示例培训的模型在我们的专家策划测试集中的错误少26%。我们的分析表明,DADC产生的例子更加困难,更词法和句法多样性,并且与非对抗性示例相比,注释伪像更少。
translated by 谷歌翻译
We introduce a new large-scale NLI benchmark dataset, collected via an iterative, adversarial human-and-model-in-the-loop procedure. We show that training models on this new dataset leads to state-of-the-art performance on a variety of popular NLI benchmarks, while posing a more difficult challenge with its new test set. Our analysis sheds light on the shortcomings of current state-of-theart models, and shows that non-expert annotators are successful at finding their weaknesses. The data collection method can be applied in a never-ending learning scenario, becoming a moving target for NLU, rather than a static benchmark that will quickly saturate.
translated by 谷歌翻译
Chit-chat models are known to have several problems: they lack specificity, do not display a consistent personality and are often not very captivating. In this work we present the task of making chit-chat more engaging by conditioning on profile information. We collect data and train models to (i) condition on their given profile information; and (ii) information about the person they are talking to, resulting in improved dialogues, as measured by next utterance prediction. Since (ii) is initially unknown, our model is trained to engage its partner with personal topics, and we show the resulting dialogue can be used to predict profile information about the interlocutors.
translated by 谷歌翻译
在处理多肢移动操纵器的触觉耳机时,尚未得到适当地解决从触觉设备和远程机器人之间的通信链路产生的稳定效果的问题。在这项工作中,我们提出了一种被动控制架构来触觉地垂直腿移动操纵器,同时在主设备和从控制器中存在延迟和频率不匹配的存在下保持稳定。在主侧,提出了对控制输入的离散时间能调制。在从侧,被动约束包括在基于优化的全身控制器中以满足能量限制。混合龙动力遥气局方案允许人工操作者在姿势模式下远程操作机器人的末端效应器,以及其基运动模式的基本速度。由此产生的控制架构在四足机器人上演示,具有添加到网络的人为延迟。
translated by 谷歌翻译